2 research outputs found
Meta-Referential Games to Learn Compositional Learning Behaviours
Human beings use compositionality to generalise from past experiences to
novel experiences. We assume a separation of our experiences into fundamental
atomic components that can be recombined in novel ways to support our ability
to engage with novel experiences. We frame this as the ability to learn to
generalise compositionally, and we will refer to behaviours making use of this
ability as compositional learning behaviours (CLBs). A central problem to
learning CLBs is the resolution of a binding problem (BP). While it is another
feat of intelligence that human beings perform with ease, it is not the case
for state-of-the-art artificial agents. Thus, in order to build artificial
agents able to collaborate with human beings, we propose to develop a novel
benchmark to investigate agents' abilities to exhibit CLBs by solving a
domain-agnostic version of the BP. We take inspiration from the language
emergence and grounding framework of referential games and propose a
meta-learning extension of referential games, entitled Meta-Referential Games,
and use this framework to build our benchmark, that we name Symbolic Behaviour
Benchmark (S2B). We provide baseline results showing that our benchmark is a
compelling challenge that we hope will spur the research community towards
developing more capable artificial agents.Comment: work in progres
ETHER: Aligning Emergent Communication for Hindsight Experience Replay
Natural language instruction following is paramount to enable collaboration
between artificial agents and human beings. Natural language-conditioned
reinforcement learning (RL) agents have shown how natural languages'
properties, such as compositionality, can provide a strong inductive bias to
learn complex policies. Previous architectures like HIGhER combine the benefit
of language-conditioning with Hindsight Experience Replay (HER) to deal with
sparse rewards environments. Yet, like HER, HIGhER relies on an oracle
predicate function to provide a feedback signal highlighting which linguistic
description is valid for which state. This reliance on an oracle limits its
application. Additionally, HIGhER only leverages the linguistic information
contained in successful RL trajectories, thus hurting its final performance and
data-efficiency. Without early successful trajectories, HIGhER is no better
than DQN upon which it is built. In this paper, we propose the Emergent Textual
Hindsight Experience Replay (ETHER) agent, which builds on HIGhER and addresses
both of its limitations by means of (i) a discriminative visual referential
game, commonly studied in the subfield of Emergent Communication (EC), used
here as an unsupervised auxiliary task and (ii) a semantic grounding scheme to
align the emergent language with the natural language of the
instruction-following benchmark. We show that the referential game's agents
make an artificial language emerge that is aligned with the natural-like
language used to describe goals in the BabyAI benchmark and that it is
expressive enough so as to also describe unsuccessful RL trajectories and thus
provide feedback to the RL agent to leverage the linguistic, structured
information contained in all trajectories. Our work shows that EC is a viable
unsupervised auxiliary task for RL and provides missing pieces to make HER more
widely applicable.Comment: work in progres